Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics.

نویسندگان

  • N Goldman
  • S Whelan
چکیده

Likelihood ratio tests (LRTs) for comparing models of sequence evolution have become popular over the last few years (Goldman 1993; Yang, Goldman and Friday 1994, 1995; Huelsenbeck and Crandall 1997; Huelsenbeck and Rannala 1997). In their simplest form, such tests compare a simpler null hypothesis (H0) with a more complex alternative hypotheses (H1) which is a generalization of H0. H0 can be derived from H1 by fixing one or more of its free parameters at particular values, and the hypotheses are described as nested. Although it is also possible to test non-nested models (Goldman 1993), nested models are often preferred, as statistical tests are simpler to perform and their results can be easier to interpret. The test statistic for an LRT can be written as 2 2 where and ˆ ˆ ˆ ˆ ˆ ln(L /L ) 2(ln(L ) ln(L )), L H H H H H 1 0 1 0 0 are the maximum-likelihood (ML) scores under hyL̂H1 potheses H0 and H1, respectively. This statistic measures how much improvement H1 gives over H0, and when the hypotheses are nested, 2 will always be nonnegative. For these nested hypotheses, and under certain regularity conditions, the asymptotic distribution of 2 (i.e., for large amounts of data) will be . Here, k is the 2 k number of degrees of freedom by which H0 and H1 differ, that is, the number of free parameters of H1 whose values must be fixed to derive H0 (Wald 1949; Silvey 1975; Felsenstein 1981; Goldman 1993; Yang, Goldman, and Friday 1994, 1995). (In effect, each free parameter contributes a variate to the distribution of 2 , 2 1 with the sum of k independent variates being distrib2 1 uted as ) Statistical tests assessed using such 2 dis2 . k tributions have now become a widespread and useful tool in phylogenetics (Huelsenbeck and Crandall 1997; Huelsenbeck and Rannala 1997). Recently, there has been renewed interest in testing whether the predicted 2 distribution gives a reliable estimate of the true distribution of 2 under realistic conditions (e.g., with finite sequence lengths). Whelan and Goldman (1999) investigated cases in which the competing hypotheses were different models of nucleotide substitution. Under three specimen experimental designs (representing realistic phylogenies and nucleotide substitution processes), we found that the 2 distribution was acceptable for performing tests of the significance of parameters describing the relative rate of transition

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PuMA: Bayesian analysis of partitioned (and unpartitioned) model adequacy

SUMMARY The accuracy of Bayesian phylogenetic inference using molecular data depends on the use of proper models of sequence evolution. Although choosing the best model available from a pool of alternatives has become standard practice in statistical phylogenetics, assessment of the chosen model's adequacy is rare. Programs for Bayesian phylogenetic inference have recently begun to implement mo...

متن کامل

A Gamma mixture model better accounts for among site rate heterogeneity

MOTIVATION Variation of substitution rates across nucleotide and amino acid sites has long been recognized as a characteristic of molecular sequence evolution. Evolutionary models that account for this rate heterogeneity usually use a gamma density function to model the rate distribution across sites. This density function, however, may not fit real datasets, especially when there is a multimod...

متن کامل

The Impact of Modelling Rate Heterogeneity among Sites on Phylogenetic Estimates of Intraspecific Evolutionary Rates and Timescales

Phylogenetic analyses of DNA sequence data can provide estimates of evolutionary rates and timescales. Nearly all phylogenetic methods rely on accurate models of nucleotide substitution. A key feature of molecular evolution is the heterogeneity of substitution rates among sites, which is often modelled using a discrete gamma distribution. A widely used derivative of this is the gamma-invariable...

متن کامل

Capturing heterotachy through multi-gamma site models

Most methods for performing a phylogenetic analysis based on sequence alignments of gene data assume that the mechanism of evolution is constant through time. It is recognised that some sites do evolve somewhat faster than others, and this can be captured using a (gamma) rate heterogeneity model. Further, some species have shorter replication times than others, and this results in faster rates ...

متن کامل

Estimating the Time of a Step Change in Gamma Regression Profiles Using MLE Approach

Sometimes the quality of a process or product is described by a functional relationship between a response variable and one or more explanatory variables referred to as profile. In most researches in this area the response variable is assumed to be normally distributed; however, occasionally in certain applications, the normality assumption is violated. In these cases the Generalized Linear Mod...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Molecular biology and evolution

دوره 17 6  شماره 

صفحات  -

تاریخ انتشار 2000